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1 .  Overview 

The  purpose  of  this  interface  is  to  establish  an  application ^ level 
(ISO  2)  protocol  for  query/retrieval  applications.   The  initial 
implementation  will  provide  a  protocol  for  the  DowQuest  database 
service  provided  by  Dow  Jones  News  Retrieval.   Workstation  interfaces 
will  be  implemented  on  the  Macintosh  as  part  of  the  WAIS  project  (Wide 
Area  Information  Server) .   The  intention  is  to  provide  a  sophisticated 
and  expandable  computer-to-computer  interface  for  future  databases. 

This  protocol  is  based  on  the  Z39. 50-1988  ("the  standard")  Information 
Retrieval  Service  Definitions  and  Protocol  Specification  for  Library 
Applications.   Each  section  of  this  document  includes  references  m 
square  brackets  "[]"  to  the  appropriate  section  (s)  in  the  Z39.50 
specification . 

The  standard  specifies  an  Opens  Systems  Interconnection  application 
layer  service  definition  and  protocol  specification  for  Information 
Retrieval.   The  Information  Retrieval  protocol  allows  an  application 
on  one  computer  to  query  the  database  of  another  computer.   The 
protocol  specifies  the  procedures  and  structures  for  the  mtersystem 
submission  of  a  search  request  (including  the  syntax  of  the  query), 
request  for  the  transmission  of  database  records  located  by  a  search, 
the  responses  to  the  request,  access  control,  and  resource  control. 

This  is  the  last  version  of  the  WAIS  protocol  to  be  based  on  the 
Z39.50  standard.  The  next  version  will  implement  the  newer  SR-1 
standard,  which  is  based  on  Z39.50,  but  is  written  in  ASN.l. 

The  WAIS  extensions  to  the  standard  are  primarily  to  support 
"relevance  feedback"  queries.   (The  standard  currently  supports  a 
boolean  query  syntax.)   The  Present  facility  is  not  used,  in  order  to 
allow  the  target  system  to  be  "stateless"  (to  always  delete  Result- 
Sets.)   Instead,  a  Type-1  query  is  used  for  text  retrieval.   In  order 
to  retrieve  document  number  xxx,  a  search  is  performed  with  a  query 
specifying  that  System-Control-Number=xxx. 

The  WAIS  extensions  also  enable  the  origin  to  request  a  range  of  ^ 
document  text.   The  Type-1  query  is  used  as  described  in  the  previous 
paragraph  with  the  addition  of  Chunk-Code  parameters.   The  portion  of 
the  document  that  matches  the  Chunk-Code  values  will  be  returned,  e.g. 
"System-Control-Number=xxx  AND  Line>1000  AND  Line  <=  2000"  would 
return  lines  1001  through  2000  of  document  xxx. 

This  protocol  requires  the  target  system  to  return  unique  document  IDs 
in  a  Search-Response,  labeled  as  System-Control-Number  (see  Appendix  C 
of  the  standard) .   These  document  IDs  are  used  by  the  origin  (user 
interface)  to  specify  documents  when  requesting  display  of  a  document 
or  in  relevance  feedback  searches. 

Retrieval  of  large  documents  dependsw  on  the  ability  to  specify  a 
range  of  a  document  in  a  search.   This  will  be  specified  with  an 
extension  called  "Chunks."   This  version  of  the  protocol  does  not  have 
a  method  for  the  origin  and  target  to  negotiate  the  available  chunk 
types.   Three  chunk  types  are  currently  defined  for  DowQuest:  Byte, 
Line,  and  Paragraph. 

For  efficiency  reasons  it  is  useful  to  refer  to  a  document  range  with 
large  "chunks"  that  have  been  marked  in  the  text  by  the  target  system. 
The  chunk  markers  and  IDs  are  not  displayed  to  the  user,  but  are  used 
by  the  origin  when  the  user  selects  a  range  of  a  document  for  a 
relevance  feedback  query.   The  Init-Response  APDU  is  extended  to 
provide  "chunk"  markers  and  sizes  which  may  be  used  to  specify 
document  ranges  in  relevance  feedback  queries. 

The  User  Information  part  of  APDUs  is  used  in  more  complex  ways  in 
this  extension'  than  was  originally  envisioned  in  the  standard.   In  the 
standard,  the  User  Information  part  was  a  single  Element  of  type 
"any."   The  WAIS  protocol  extensions  uses  User-Inf ormation-Field 
preceding  the  set  of  elements  in  the  user  information  part  of  an  APDU. 
This  is  the  length  in  bytes  of  all  the  following  elements,  excluding 
the  User-Information-Length  element. 


1 . 1  Supported  Facilities 

For  the  June  1990  target  delivery  date  of  the  prototype  WAIS  system, 
DowQuest  will  support  only  2  facilities  from  the  Z39.50  specification. 

The  "Initialization  Facility"  [3.2.1]  includes  an  "Init  APDU" 
[4.1.1.1,  table  A2]  and  an  "Init-Response  APDU"  [4.1.1.2,  Table  A3]. 

The  "Search  Facility"  [3.2.2]  includes  a  "Search  APDU"  [4.1.1.3,  table 
A4]  and  a  "Search-Response  APDU"  [4.1.1.4,  table  A5] . 

"APDU"  means  "Application  Protocol  Data  Unit,"  which  is  a  unit  of  data 
passed  between  an  origin  (user  workstation)  and  target  (database 
server).   These  and  other  terms  are  defined  in  section  2  of  the  Z39.50 
specification . 

The  Search  APDU  will  be  extended  to  have  a  new  query  type:  Type-3, 
"Relevance  Feedback  Query." 

The  Search-Response  APDU  will  be  modified  to  include  new  elements  in 
Database-Records,  including  Document-IDs  (used  for  relevance  feedback) 
and  other  fields,  specified  in  section  4  of  this  document. 

1.2  Unsupported  Facilities 

The  remaining  5  facilities  from  Z39.50  are  not  supported  in  the  WAIS 
prototype. 

The  "Retrieval  Facility"  will  not  be  supported  in  the  Wais  prototype. 
Document  text  will  be  retrieved  using  a  Type-1  query  based  on 
System-Control-Number  (document  ID)  . 

The  "Result-Set-Delete  Facility"  is  not  needed  because  DowQuest  will 
always  delete  all  Result-Sets  after  returning  a  Search-Response  APDU. 

The  "Access  Control  Facility"  will  not  be  supported.   All  users  will 
have  access  to  all  data  in  DowQuest. 

The  "Accounting/Resource  Control  Facility"  will  not  be  supported. 
DowQuest  responses  have  a  maximum  size. 

The  "Termination  Facility"  is  not  needed  because  DowQuest  will  not 
store  any  state  about  user  sessions.   Each  request  and  response  will 
be  a  complete  transaction,  independent  of  all  others.   Either  the 
origin  or  the  target  may  abort  a  session  at  any  time. 

1.3  Conformance  with  Version  1  of  Z39.50 

1.3.1  Extensibility 

As  specified  in  section  4 . 3  of  the  standard,  WAIS  systems  will  ignore 
unknown  data  elements  and  options  in  received  Init  APDUs. 

1.3.2  Static  Requirements 

The  DowQuest  system  will  conform  to  the  Static  Requirements  specified 
in  section  4.4.1  of  the  standard,  with  extensions  noted  in  this 
document,  except  that  it  will  NOT  support  general  boolean  Type-1 
queries.   The  Type-1  query  will  be  used  only  for  retrieval  of 
documents  based  on  System-Control-Number  and  Chunks. 

1.3.3  Dynamic  Requirements 

WAIS  systems  will. conform  to  the  Dynamic  Requirements  specified  in 
section  4.4.2  of  the  standard.   There  are  restrictions  on  the  Type-1 
Query. 

1.3.4  Statement  Requirements 

DowQuest  will  be  capable  of  acting  in  the  role  of  target.   It  supports 


version  1  of  the  standard. 

See  section  1 . 2  of  this  document  for  unsupported  facilities. 

Result-Sets  will  always  be  unilaterally  deleted  by  DowQuest.   It  will 
not  accept  Search  APDUs  specifying  named  result  sets.   Each  input  and 
response  message  pair  is  a  complete,  independent  transaction.   Thus, 
multiple  users  may  share  a  single  session,  although  the  order  of 
responses  is  not  guaranteed  to  be  the  same  order  as  the  requests.   If 
multiple  users  share  a  connection,  the  origin  must  use  Reference-IDs 
to  identify  input/response  message  pairs. 

DowQuest  supports  element  set  names  in  Search  APDUs  as  specified  in 
section  4  of  this  document. 

The  maximum  number  of  database  names  that  may  be  specified  in  a  Search 
APDU  will  be  determined  by  the  implementors . 

1.4  Errors  in  the  Standard 

Table  A7  on  p.  43  of  the  standard  is  a  copy  of  table  A6.   Table  A7 
should  contain  the  fields  defined  in  4.1.1.6,  p.   23.   Earlier 
versions  of  the  WAIS  protocol  specification  contained  the  same  error 
in  table  B. 6 . 


2.  Initialization  Facility 

DowQuest  will  accept  an  Init  APDU  at  any  time,  and  will  always  respond 
with  an  Init -Response  APDU.   Since  DowQuest  is  stateless,  the 
Initialization  facility  is  not  required  to  begin  a  user  session,  but 
it  may  be  used  anytime  to  get  the  system  parameters. 

The  Init-Response  APDU  may  specify  "chunk"  parameters  that  may  be  used 
to  specify  a  range  of  a  document  in  a  relevance  feedback  Type-3  Query. 
[???  The  chunk  negotiation  needs  to  be  defined  more  completely.] 

The  Init-Response  APDU  may  also  specify  newline  characters, 
non-displayable  field  markers,  and  highlight/non-highlight  markers, 
and  fields  describing  how  often  the  target  is  updated  and  when  the 
target  is  updated. 

2.1  Init  APDU 

The  Init  APDU  requests  information  about  the  database  service  [3.2.1, 
4.1.1.1,  and  Table  A2] .   Since  DowQuest  is  stateless,  Init  is  not 
required  to  begin  a  user  session. 

The  Options  field  must  always  have  0="will  not  use"  for  the  Delete 
facility. 

See  Appendix  B.l  of  this  document  for  an  example  Init  APDU. 

2.2  Init-Response  APDU 

The  Init-Response  APDU  provides  information  about  the  database  service 
[3.2.1,  4.1.1.2,  and  Table  A3] . 

The  Options  field  will  always  have  0="will  not  support"  for  the 
access-control  and  resource-control  facilities. 

Implementation-Name  will  be  "DowQuest",  and  the  Implementation-Version 
will  be  set  by  the  implementors,  to  be  updated  as  new  versions  are 
released. 

Preferred-Message-Size  and  Maximum-Record-Size  will  be  determined 
during  the  implementation. 

See  Appendix  B.2  of  this  document  for  an  example  Init-Response  APDU. 

2.2.1  Chunk  IDs 

The  User-Information-Field  of  the  Init-Response  APDU  will  contain 
four  elements  indicating  ways  the  origin  may  specify  a  region  of  a 
document  to  be  used  in  a  relevance  feedback  Type-3  query.   The  region 
is  composed  of  a  range  of  "chunks"  such  as  bytes  or  paragraphs.   The 
elements  are: 

Search-Chunk-Code-Bitmap  0       bitmap 

Present-Chunk-Code-Bitmap  [???]  0  bitmap 

Chunk-ID-Length  C       integer 

Chunk-Marker  C       ASCII 

Search-Chunk-Code-Bitmap  specifies  the  chunk  codes  the  target  will 
accept  in  Type-1  Queries  in  Search  APDUS  requesting  display  of    _ 
document  regions.   The  bitmap  indicates  with  a  "1"  in  a  bit  position 
that  the  corresponding  code  number  will  be  accepted  by  the  target 
system.   For  example,  to  indicate  that  the  target  accepts  accepts 
Chunk-Codes  1  and  3  in  a  Search  APDU  it  would  return 

Search-Chunk-Code-Bitmap  with  bits  1  and  three  set  to  1  and  all  other 
bits  0. 

Initially,  four  Chunk-Codes  are  defined.   The  default  is  1  "Byte"  (see 
section  5  of  this  document) : 

Chunk-Code=0  "Document" 


Chunk-Code=l  "Byte" 
Chunk-Code=2  "Line" 
Chunk-Code=3  "Paragraph" 

(In  the  future  this  may  be  extended  to  include  other  measures,  such  as 
Word,  Page,  or  Chapter-ID.  Other  media  such  as  audio  might  use  chunks 
such  as  Song-ID  or  Seconds.   Video  might  use  Frame  or  Scene-ID.) 

Chunk-Code=l  "Byte"  is  the  most  general  case.   With  this  chunk  size, 
Chunk-Marker  and  Chunk-ID-Length  are  not  used.   The  origin  may 
indicate  ranges  of  a  document  in  bytes  by  setting  Chunk-Code=l  and 
providing  pairs  of  byte-offsets  in  a  relevance  feedback  Type-3  query. 
If  any  Chunk-Code  >  1  is  accepted,  the  target  must  also  provide 
Chunk-ID-Length  and  Chunk-Marker. 

DowQuest  will  provide  Chunk-Code=3  (Paragraph-ID)  for  relevance 
feedback  Type-3  Queries,  and  Chunk-Code=2  (Line)  for  text  retrieval 
Type-1  Queries. 

[???  Need  more  general  chunk  mechanism  for  both  tagged  and  counted 
types,  e.g.  paragraphs  are  tagged,  but  lines  are  counted  (each  line  is 
"tagged"  only  by  the  presence  of  a  newline) .   This  will  be  addressed 
in  the  next  version  of  the  protocol.] 

2.2.2  Other  Markers 

DowQuest  will  also  provide  elements  in  the  User- Information  field  of 
the  Init-Response  APDO  indicating  various  non-displayable  marker 
fields.   These  include: 

Highlight-Marker 
De-Highlight -Marker 
Newline-Characters 

If  Highlight -Marker  is  present,  De-Highlight -Marker  is  required. 

2.2.3  Other  Information  Elements 

WAIS  targets  may  provide  elements  describing  how  often  and  when  the 
database  is  updated: 

Update-Frequency        0        [???] 
Update-Times  0       [???] 

[???  pricing  info?]      0       [???] 

[The  format  and  tags  of  these  fields  is  TBD.] 


0 

ASCII 

c 

ASCII 

0 

ASCII 

3.    Search  Facility 

3.1  Search  APDU 

The  Search  APDU  will  be  implemented  as  defined  in  the  standard  [3.2.2, 
4.1.1.3,  and  Table  A4] .   However,  the  Result-Set  will  always  be 
deleted  by  DowQuest  immediately  after  returning  a  Search-Response 
APDU,  so  the  Replace-Indicator  field  in  the  Search  APDU  should  be 
"on,"  an  and  Result-Set-Names  is  not  used.   Search  APDUs  may  not  refer 
to  a  Result-Set.   This  enables  DowQuest  to  be  stateless. 

The  Type-3  Relevance  Feedback  Query  syntax  is  outside  the  scope  of  the 
standard.   The  syntax  used  by  DowQuest  is  given  in  Appendix  A. 

DowQuest  will  support  the  Type-1  Query  syntax,  but  not  for  general 
boolean  queries.   Only  searches  specifying  System-Control-Number  (and 
possibly  Chunk  ranges)  are  supported. 

See  Appendix  B.3  of  this  document  for  an  example  Search  APDU. 

3.2  Search-Response  APDU 

The  Search-Response  APDU  is  almost  the  same  as  specified  in  the 
standard  [3.2.2,  4.1.1.4,  and  table  A5],  with  a  new  type  of 
Database/Diagnostic-Record.   The  elements  used  in  Database-Records 
[3.2.2.1.5,  A. 1.3.1]  are  specified  in  section  4  of  this  document. 

The  Result-Set  will  always  be  deleted  by  the  DowQuest  immediately 
after  sending  a  Search-Response  APDU. 

The  default  element  set  returned  in  each  Database-Record  by  DowQuest 
in  a  Search-Response  APDU  is  "Document -Header,  "  defined  in  section  5 
of  this  document. 

For  records  that  are  beyond  the  Medium-Set -Present-Number  in  the 
Search  APDU,  DowQuest  will  return  the  "Document-Short-Header"  element 
set.   This  will  probably  not  happen  in  normal  circumstances  since 
DowQuest  returns  a  maximum  of  16  documents.   The  origin  can  request 
the  Date/Score /Headline/etc.  elements  by  requesting  a  Document- 
Headline  element  set  in  subsequent  Search  APDUs.   [???  Perhaps  we 
should  use  message-length  or  buffer  sizes  to  control  this,  instead?] 


See  Appendix  B.4  for  an  example  Search-Response 


APDU. 


4.  Element  Sets  supported  by  DowQuest 

The  elements  supported  by  a  particular  target  are  outside  the  Z39.50 
standard  [3.2.2.1.3].   DowQuest  will  support  the  following 
Element-Set-Names.   These  are  used  in  Search  and  Search-Response 
APDUs.   Element-Set-Names  is  an  optional  field  in  Search  APDUs  [Table 
2,  Table  3] . 

Elements  marked  with  a  "*"  can  only  appear  in  a  Search-Response  APDU, 
since  the  information  is  deleted  with  the  Result-Set,  so  is  no  longer 
available  when  requesting  text,  i.e.  the  text  headline  and  code 
elements  should  only  be  used  with  Type-1  queries. 

The  second  column  notes  whether  an  element  is  Required,  Optional,  or 
Conditional  in  a  given  APDU. 

The  elements  and  their  tag  values  are  defined  in  section  5  of  this 
document . 

4.3  Document -Header 

A  Search-Response  APDU  contains  one  variable  element: 

Seed-Words -Used         0       ASCII 

The  rest  of  this  element  set  is  returned  by  default  for  each 
Database-Record  in  a  Search-Response  APDU: 

System-Control-Number   R  ANY 

Version-Number  0  integer 

Score  *  0  integer 

Best-Match  *  0  integer 

[???]    Lines  0  integer 

Document-Length  0  integer 

Source  0  ASCII 

Date  0  ASCII 

Title  C  ASCII 

Geographic-Name  0  ASCII 

4.4  Document-Text 

This  element  set  may  be  returned  for  each  Database-Record  in  a 
Search-Response  APDU  in  response  to  a  Type-1  query: 


Document-ID 

R 

ANY 

Version-Number 

0 

integer 

Document -Text 

R 

ASCII 

4.5  Document-Short -Header 

This  element  set  is  returned  in  the  Database-Record  in  a 
Search-Response  APDU  for  documents  that  are  beyond  the 
Medium-Set -Present -Number: 


Document-ID 

R 

ANY 

Version-Number 

0 

integer 

Score  * 

0 

integer 

Be st -Match  * 

0 

integer 

Document -Length 

R 

integer 

4 . 6  Document -Headline 

This  element  set  is  returned  in  a  Search-Response  APDU  when  requested 
in  a  Type-1  Query  in  a  Search  APDU  for  documents  that  were  previously 
returned  with  Document-Short-Header  element  sets  because  of  size 
restrictions : 


Document-ID 

R 

ANY 

Version-Number 

0 

integer 

Source 

0 

ASCII 

Date 

Headline 

Origin 


0 

ASCII 

R 

ASCII 

0 

ASCII 

4  .  7  Document -Long-Header 

This  element  set  may  be  optionally  requested  in  a  Search  APDU  to  be 
returned  in  a  Search-Response  APDU: 


Document -ID 

R 

ANY 

Version-Number 

0 

integer 

Score  * 

0 

integer 

Be st -Match  * 

0 

integer 

Document -Length 

R 

integer 

Source 

0 

ASCII 

Date 

0 

ASCII 

Headline 

R 

ASCII 

Origin 

0 

ASCII 

Stock-Codes 

0 

ASCII 

Company-Codes 

0 

ASCII 

Industry-Codes 

0 

ASCII 

[???  what  about  more 

general 

codes,  e.g.  author,  pricing 

copyright?] 

4 . 8  Document -Codes 

This  element  set  is  returned  in  a  Search-Response  APDU  when  requested 
in  a  Search  APDU: 


Document-ID 

R 

ANY  , 

Version-Number 

0 

integer 

Stock-Codes 

0 

ASCII 

Company-Codes 

0 

ASCII 

Industry-Codes 

0 

ASCII 

6.  Data  Element  Definitions 

Beqin-Date-Range  is  the  latest  date  for  finding  documents  in  a  query 
where  Date-Factor  is  DF_LATER  or  DF_SPECIFIED_RANGE.   Dates  are  ASCII, 
of  the  form  yyyymmdd. 

Best-Match  is  the  approximate  byte  offset  within  a  document  of  the 
highest-scoring  portion  of  the  document. 

Chunk-Code  specifies  the  size  of  chunks  used  in  document  regions.   The 
default  value  is  1 .   In  DowQUest  two  Chunk-Codes  are  supported: 
DowQuest  will  provide  Chunk-Code=3  (Paragraph-ID)  for  relevance 
feedback  Type-3  Queries  in  a  Search  APDU,  and  Chunk-Code=2  (Line)  for 
text  retrieval  Type-1  Queries  in  a  Search  APDU.   Chunk-Code=l  (Byte) 
is  the  most  general  case.   With  this  chunk  size,  Chunk-Marker  and 
Chunk-ID-Length  are  not  used.   The  origin  may  indicate  ranges  of  a 
document  in  bytes  by  setting  Chunk-Code=l  and  providing  pairs  of 
byte-offsets  in  a  relevance  feedback  Type-3  query.   Otherwise,  the 
origin  indicates  chunk  ranges  by  specifying  Chunk-Start-ID  and 
Chunk-End-ID. 

Chunk-End-ID  —  see  Chunk-Start-ID. 

Chunk-ID-Length  specifies  how  many  bytes  Chunk-IDs  will  be.   In 
DowQuest  Chunk-ID-Length  for  paragraphs  is  3  bytes.   The  contents  of  a 
Chunk-ID  is  opaque  to  the  origin  system.   The  value  is  used  unchanged 
when  specifying  a  chunk  range  in  a  relevance  feedback  Type-3  query. 

Chunk-Marker  specifies  an  ASCII  byte  sequence  that  will  occur  in  the 
document  text  as  a  delimiter  for  the  start  of  a  chunk  (except 
Chunk-Code=l  (Byte)  which  has  no  markers) .   In  DowQuest  Chunk-IDs  for 
paragraphs  are  preceded  by  "<ESC>1"  which  is  a  two-byte  Chunk-Marker. 

Chunk-Start-ID  and  Chunk-End-ID  are  either  Chunk-IDs  (type  ANY)  that 
were  each  marked  with  a  Chunk-Marker  in  the  text  of  a  document 
returned  in  a  Search-Response  APDU;  or,  if  Chunk-Code=l,  they  are 
integers  containing  byte  offsets  in  the  text  of  the  document.   They 
delimit  the  beginning  and  end  of  a  user-selected  relevant  region  of 
the  document  to  be  used  for  a  relevance  feedback  query. 

Company-Codes  contains  ASCII  codes  describing  companies  that  are 
mentioned  in  a  document. 

Date  is  the  ascii  date  a  document  was  published  (yyyymmdd) . 

Date-Factor  is  one  of:  1  "DF_INDEPENDENT" ,  2  "DF_LATER",  3 
"DF  EARLIER",  or  4  "DF_SPECIFIED_RANGE" .   The  default  is 
Date-Factor=l ,  which  specifies  no  special  weighting  of  dates.   The 
other  3  values  specify  bonus  scoring  for  documents  with  dates  greater, 
less  than,  or  between  specified  dates,  respectively.   Date-Factor=2 
uses  Begin-Date-Range,  Date-Factor=3  uses  End-Date-Range,  and 
Date-Factor=4  uses  both. 

De-Highlight -Marker  —  see  Highlight -Marker. 

Document-ID  is  a  field  that  was  previously  returned  in  a 
Search-Response  APDU.   It  is  unique  in  the  database  being  searched. 
It  must  be  used  in  a  Search  APDU  exactly  as  it  was  returned  in  a 
Search -Response  APDU.   See  Document-ID-Chunk. 

Document-ID-Chunk  is  the  same  as  a  Document-ID  element,  except  that  it 
must  be  followed  by  two  or  three  chunk  elements  defining  a  fragment  of 
the  document:  Chunk-Code,  Chunk-Start- ID,  Chunk-End-ID.   Chunk-Code  xs 
optional;  if  Chunk-Code  is  missing,  the  previous  value  of  Chunk-Code 
in  the  current  APDU  is  used;  or  if  Chunk-Code  never  appeared  xn  thxs 
APDU,  the  default  value  is  Chunk-Code=l  (Byte) . 

Document -Length  is  the  length  of  the  entire  document  in  bytes. 

Document -Text  is  a  portion  of  a  document  text. 

End-Date-Range  is  the  earliest  date  for  finding  documents  in  a  query 
where  Date-Factor  is  DF  EARLIER  or  DF_SPECIFIED_RANGE.   Dates  are  ASCII, 


of  the  form  yyyymmdd. 

Headline  is  a  short  ASCII  description  of  the  document  for  presentation 
to  the  user.   In  DowQuest  it  is  a  maximum  of  160  bytes  [???  is  this  a 
requirement?] . 

Highlight-Marker  and  De-Highlight-Marker  are  character  sequences  that 
precede  and  follow  text  that  may  be  displayed  with  highlighting.   In 
DowQuest,  every  searchable  term  is  preceded  by  "<DC1>"  (0x11)  and 
followed  by  "<DC3>"  (0x13) . 

Industry-Codes  contains  ASCII  codes  describing  industries  that  are 
mentioned  in  a  document. 

Max-Documents-Retrieved  is  the  maximum  number  of  documents  requested 
by  the  origin  in  a  Search  APDU  to  be  returned  in  a  Search-Response 
APDU.   In  DowQuest  the  default  value  is  16  [???  probably  should  not 
have  a  default  value?] .   The  target  may  return  less  than 
Max-Documents-Retrieved  documents . 

Newline-Characters  indicates  what  characters  are  used  at  the  end  of 
lines.   In  DowQuest  this  is  "<CR>"  (OxOD) . 

Origin-City  is  an  ASCII  name  of  the  city  and/or  country  where  a 
document  originated. 

Present-Chunk-Code-Bitmap  is  a  bitmap  indicating  what  Chunk-Codes  may 
be  used  in  a  Present  APDU  to  specify  a  text  range  of  a  document  to  be 
returned.   See  Search-Chunk-Code-Bitmap  for  its  definition.   [???  Thxs 
is  obsolete.   Chunk-Codes  must  be  worked  out  more  completely.] 

Score  is  a  measure  of  how  well  the  document  matched  the  query.   It  may 
be  any  integer  value.   [???  We  may  need  to  define  a  valid  score  range 
to  be  used  by  all  targets,  or  add  a  field  in  the  .Init-Response  APDU  to 
specify  the  range  for  the  current  target.] 

Search-Chunk-Code-Bitmap  is  a  bitmap  indicating  what  Chunk-Codes  may 
be  used  in  a  Search  APDU  query  to  specify  a  range  of  a  document.   The 
bitmap  indicates  with  a  "1"  in  a  bit  position  that  the  corresponding 
code  number  will  be  accepted  by  the  target  system.   For  example,  to 
indicate  that  the  target  accepts  accepts  Chunk-Codes  1  and  3  in  a 
Search  APDU  it  would  return  Search-Chunk-Code-Bitmap  with  bits  1  and 
three  set  to  1  and  all  other  bits  0. 

Seed-Words  is  a  text  string  containing  the  initial  seed  words  in  a 
relevance  feedback  Type-3  query. 

Seed-Words-Used  is  the  same  format  as  Seed-Words  except  it  contains 

only  words  that  actually  matched  some  documents  in  the  database.   This 

allows  the  user  interface  to  give  the  user  feedback  about  which  seed 
words  were  effective  in  a  query. 

Source  is  an  ASCII  string  identifying  the  original  source  of  a 
document  (e.g.  newspaper  name,  journal  title,  etc.) 

Stock-Codes  contains  ASCII  stock  ticker  codes  for  companies  that  are 
mentioned  in  a  document. 

Text-List  is  a  list  of  text  strings  that  are  provided  by  the  user. 
They  are  document  fragments  that  come  from  outside  the  DowQuest 
database  which  the  user  wants  to  use  in  a  search.   They  are  processed 
in  the  same  manner  as  seed  words  except  they  are  not  given  seed  word 
weight  bonuses.   **This  would  be  a  new  feature  of  a  query  within 
DowQuest,  and  would  require  changes  to  the  Query  Server  as  well  as  the 
User  Server  portion  of  DowQuest.   It  will  not  be  implemented  for  the 
June  '90  prototype. 

User-Information-Length  is  the  length  of  the  entire  user  information 
part  of  an  APDU  when  it  consists  of  more  than  one  element. 
User-Information-Length  does  not  include  itself  in  the  length. 

Version-Number  is  used  to  validate  a  local  copy  of  a  document's  text. 
If  a  document  is  modified  in  the  target  server,  its  Version-Number 


must  be  incremented.   If  a  document  may  not  be  cached,  Version-Number 
is  set  to  0.   The  default  value  is  0. 


5.1  Tag  Values  of  the  Data  Element 

This  table  is  an  extension  to  the  table  19  in  section  4.1.3  of  the 
standard. 


Element 


Tag 


PDU 


R/O/C 


User-Information-Length [???]  99 

Init-Response 

C 

Search 

C 

Search -Response 

C 

Chunk-Code 

100 

Search 

0 

Chunk-ID-Length 

101 

Init-Response 

C 

Chunk-Marker 

102 

Init-Response 

C 

Highlight-Marker 

103 

Init-Response 

0 

De-Highlight -Marker      104 

Init-Response 

C 

Newline-Characters 

105 

Init-Response 

0 

Seed-Words 

106 

Search 

C 

Document-ID-Chunk 

107 

Search 

0 

Chunk-Start-ID 

108 

Search 

0 

Chunk-End-ID 

109 

Search 

C 

Text -List 

110 

Search 

0 

Date-Factor 

111 

Search 

0 

Begin-Date-Range 

112 

Search 

0 

End-Date-Range 

113 

Search 

C 

Max-Documents-Retrieved  114 

Search 

R 

Seed-Words -Used 

115 

Search -Response 

0 

Document-ID 

116 

Search 

0 

Search -Response 

R 

Version-Number 

117 

Search -Response 

0 

Score 

118 

Search -Response 

0 

Best -Match 

119 

Search -Response 

0 

Document -Length 

120 

Search -Response 

R 

Source 

121 

Search-Response  (, 

0 

Date 

122 

Search -Response 

0 

Headline 

123 

Search -Response 

C 

Origin-City 

124 

Search-Response 

0 

Search-Chunk-Code-] 

Bitmap   125 

Search 

0 

Present-Chunk-Code' 

-Bitmap  [???] 

12  6  Search 

0 

Document -Text 

127 

Search-Response 

R 

Stock-Codes 

128 

Search-Response 

0 

Company-Code  s 

129 

Search-Response 

0 

Industry-Codes 

130 

Search-Response 

0 

Appendix  A.  Type-3  Query  (Relevance  Feedback) 

Query  syntax  is  not  part  of  the  Z39.50  specification,  but  a  Type-1 
query  is  suggested  in  Appendix  B  of  the  standard  for  Boolean  querxes . 
This  is  a  similar  suggestion  for  relevance  feedback  queries. 

The  Type-3  Query  supports  the  relevance  feedback  style  of  database 
query  (as  provided  by  DowQuest) .   The  Type-3  query  includes  the 
following  elements: 


Seed-Words 

Document -ID 

Document-ID-Chunk 
Chunk-Code 
Chunk-Start-ID 

Chunk-End-ID 


R       ASCII 

0       ANY      (see  Note  1  below) 
0       ANY      (see  Note  2  below) 
0       binary 
C       if  Chunk-Code=l,  binary 

else  ANY 
C       if  Chunk-Code=l ,  binary 

else  ANY 


(may  repeat  Document-ID  and  Document-ID-Chunk  elements) 


0 

ASCII 

0 

integer 

C 

ASCII 

c 

ASCII 

R 

integer 

Text-List  O       Aisuii    (Not  in  DowQuest) 

Date-Factor 

Begin-Date-Range 

End-Date-Range 

Max-Documents-Retrieved 

Note  1 :  There  may  be  any  number  of  Document-ID  and  Document-ID-Chunk 
elements  in  a  Type-3  Query,  intermixed. 

Note  2:  Each  occurrence  of  a  Document-ID-Chunk  element  must  be 
followed  by  two  or  three  chunk  elements,  defining  a  fragment  of  the 
document . 


Appendix  B.  Sample  APDUs  in  WAIS  Demonstration  System 

In  the  following,  binary  values  are  shown  in  hexadecimal  preceded  by 
Ox.   Variable  fields  include  a  tag  and  length  [see  A. 1.2.1,  A. 1.2. 2, 
and  Table  19].   See  section  5.1  of  this  document  for  tag  values  for 
WAIS  elements. 


B.l  Init  APDU 
[see  Table  7,  Table  A2] 


ITEM 


Header-Length-Indicator 
Header: 

Fixed  portion: 

PDU-Type 
Variable  Portion: 

Protocol-Version 

Options 

Preferred-Mess age- Size 

Maximum-Record-Size 

Reference- ID 
User  information  part: 

(none) 


BYTE  POS. 


1-2 


VALUE 


0x0015 


0x14 


NOTE 


21 


20 


4-6 

0x030101 

1 

7-9 

0x0401C0 

bit  1,2 

10-13 

0x05020400 

1024 

14-17 

0x06020800 

2048 

18-23 

0x020400000001 

1 

B.2  Init-Response  APDU 
[see  Table  8,  Table  A3] 
ITEM 


BYTE  POSITION.   VALUE 


Header-Length-Indicator 
Header: 

Fixed  portion: 

PDU-Type 

Result 
Variable  Portion: 

Protocol-Version 

Options 

Preferred-Message-Size 

Maximum-Record-Size 

Implementation -Name 

Implementation -Version 

Reference-ID 
User-Information-Field 

Search-Chunk-Code-Bitmap 

Present-Chunk-Code-Bitmap?? 

Chunk-Id-Length 

Chunk-Marker 

Highlight -Marker 

De-Highlight -Marker 

Newline-Characters 


1-2 


3 
4 

5-7 

8-10 

11-14 

15-18 

19-28 

29-33 

34-39 

40-42 

43-45 

46-48 

49-51 

52-55 

56-58 

59-61 

62-65 


0x0025 


NOTE 


37 


0x15 

21 

0x01 

l="accept" 

0x030101 

1 

0x0401C0 

bit  1,2 

0x05020400 

1024 

0x06020400 

1024 

0x0908"DowQuest 

It 

0xl003"1.0" 

0x020400000001 

1 

0x??0217 

?? 

0x7D0140 

bit  2 

0x7E0180 

bit  1 

0x650103 

3 

0x66021B6C 

"<ESC>1" 

0x670111 

"<DC1>" 

0x680112 

"<DC2>" 

0x69020D0A 

"<CRXLF>' 

B.3  Search  APDU 

[see  Table  9,  Table  A4] 

B.3.1  Example  query  containing  only  Seed-Words  element  (no 
Document -ID) : 


ITEM 


BYTE  POSITION.   VALUE 


Header-Length-Indicator  1-2 

Header: 

Fixed  portion: 

PDU-Type  3 

Small-Set-Upper-Bound  4-6 

Large-Set-Lower-Bound  7-9 
Medium-Set-Present-Number   10-12 


0x0018 


0x16 

0x000400 
0x000800 
0x000800 


NOTE 


24 


22 

1024 
2048 
2048 


Replace-Indicator 
Variable  Portion: 
Result-Set -Name 
Database-Names 
Query-Type 
Reference-ID 
User-Information-Field 
Type-3  Query: 
Seed-Words 


13 

14-15 
16-17 
18-20 
21-26 
27-29 

30-62 


Max-Documents-Retrieved    63-65 


0x01  l="on" 

0x1100 

0x1200 

0x130133         "3" 

0x020400000002   2 

0x??0224         36 

0x6AlF"Tell  me  about 
Thinking  Machines" 
0x720110         16 


[???  remove  this  field;  use  Small-Set-Upper-Bound  or  something...] 


B.3.2 


ITEM 


Example  query  containing  Seed-Words,  one  Document-ID  and 

one  Document-ID-Chunk  element.   This  query  includes  seed  word 

"Apple,"  and  specifies  using  all  of  document  00000001WJ  m  the 

search,  and  paragraphs  with  IDS  005  through  007  from  document 

00000023WJ: 


BYTE  POSITION.   VALUE 


NOTE 


Header-Length-Indicator 
Header: 

Fixed  portion: 

PDU-Type 

Small-Set-Upper-Bound 

Large-Set-Lower-Bound 

Medium-Set -P resent -Numbe 

Replace-Indicator 
Variable  Portion: 

Re suit -Set -Name 

Database-Names 

Query-Type 

Reference-ID 
User-Information-Field 
Type-3  Query: 

Seed-Words 

Max-Documents-Retrieved 
[???  remove  this  field;  use 

Document -ID 

Document-ID-Chunk 

Chunk-Code 

Chunk-Start-ID 

Chunk-End- ID 


1-2 


3 

4-6 
7-9 
r    10-12 
13 

14-15 
16-17 
18-20 
21-26 
27-29 

30-36 
37-39 
Small-Set- 
40-51 
52-63 
64-66 
68-72 
73-77 


0x0018 

24 

0x16 

22 

0x000400 

1024 

0x000800 

2048 

0x000800 

2048 

0x01 

l="on" 

0x1100 

II  II 

0x1200 

II  II 

0x130133 

"3" 

0x020400000003 

3 

0x??0230 

48 

0x6A05"Apple" 
0x720110         16 
-Upper-Bound  or  something...] 
Ox740A00000001WJ 
0x740A00000023WJ 
0x640102         paragraph 
0x6C03"005"      par  ID=005 
0x6D03"007"      par  ID=007 


B.4  Search -Response  APDU 
[see  Table  10,  Table  A5] 
ITEM 


BYTE  POSITION.   VALUE 


Header-Length- Indicator 
Header : 

Fixed  portion: 
PDU-Type 
Search-Status 
Result -Count 

Number-of-Records-Returned 
Next -Result-Set-Position 
Variable  Portion: 
Present-Status 
Reference-ID 
User-Information-Field 
Seed-Words -Used 
Database  records : 

Document -Header  element  set: 
Document-ID 
Version-Number 
Score 

Best -Match 
Document -Length 


1-2 


3 

4 

5-7 
8-10 
11-13 

14-16 
17-22 
23-25 
26-44 


45-58 
59-61 
62-67 
68-77 
78-87 


0x0014 


0x17 

0x00 

0x000002 

0x000002 

0x000000 


NOTE 


20 


23 

0="success' 

2 

2 

0 


OxlBOlOO         0="success' 
0x020400000002   2 
0x??01DD         221 
0x7311"Thinking  Machines" 


0x740C"0000000001WJ" 
0x750100         0 
0x760400000022   34 
0x77080000000000000001 
0x78080000000000000033 


Source 
Date 

Headline 
Origin-City 

Document-ID 

Version-Number 

Score 

Best -Match 

Document -Length 

Source 

Date 

Headline 

Origin-City 


88-92 
93-100 
101-109 
110-124 

125-138 
139-141 
142-147 
148-157 
158-167 
168-182 
183-190 
191-211 
212-226 


0x7 903 "WSJ" 

0x7A06"900601"   yymmdd  * 
0x7Bll"TMC  Releases  WAIS" 
0x7C0D"Cambridge,  MA" 

0x740C"0000000123ZF" 
0x750100         0 
0x760400000015   21 
0x77 08000000000000 00 6E 
0x78080000000000000121 
0x790D"Business  Week" 
0x7A06"900603" 
0x7B13"Apple  Releases  WAIS' 
0x7C0D"Cupertino,  CA" 


(*)  A  Date  element  should  actually  be  yyyymmdd 


Appendix  C.  DowQuest  Code  Formats 

C.l  Company  Codes 
[???  TBD] 

C.2  Industry  Codes 
[???  TBD] 

C.3  Stock  Codes 
[???  TBD] 


